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1 Trace-driven memory simulation: a survey 
Richard A. Uhlig, Trevor N. Mudge 



June 1997 ACM Computing Surveys (CSUR), Volume 29 Issue 2 
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terms, review 



Full text available: ™.adft§36 J.I. KB). 



As the gap between processor and memory speeds continues to widen, methods for 
evaluating memory system designs before they are implemented in hardware are becoming 
increasingly important. One such method, trace-driven memory simulation, has been the 
subject of intense interest among researchers and has, as a result, enjoyed rapid 
development and substantial improvements during the past decade. This article surveys 
and analyzes these developments by establishing criteria for evaluating trac ... 

Keywords: TLBs, caches, memory management, memory simulation, trace-driven 
simulation 



2 implementation aspects of a SPARC V9 complete machine simulator 
Bill Clarke, Adam Czezowski, Peter Strazdins 

January 2002 Australian Computer Science Communications , Proceedings of the 

twenty-fifth Australasian conference on Computer science - Volume 4, 

Volume 24 Issue 1 

Full text available: ^ .pd£1..3.3 MB). Additional Information: fuJJ. citation, abstract, references, index terms 

In this paper we present work in progress in the development of a complete machine 
simulator for the UltraSPARC, an implementation of the SPARC V9 architecture. The 
complexity of the UltraSPARC ISA presents many challenges in developing a reliable and yet 
reasonably efficient implementation of such a simulator. Our implementation includes a 
heavily object-oriented design for the simulator modules and infrastructure, caching of 
repeated computations for performance, adding an OS (system call) emu ... 

Keywords: SMP, SPARC V9 ISA, UltraSPARC, complete machine simulator, execution- 
driven simulation, object-oriented design 
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Ian Pratt, Andrew Warfield 

October 2003 Proceedings of the nineteenth ACM symposium on Operating systems 
principles 

Full text available: p sdfi'16S.75 KB^ Additional Information: full citation, abstract references, dtin$s, into 

" terms 

Numerous systems have been designed which use virtualization to subdivide the ample 
resources of a modern computer. Some require specialized hardware, or cannot support 
commodity operating systems. Some target 100% binary compatibility at the expense of 
performance. Others sacrifice security or functionality for speed. Few offer resource 
isolation or performance guarantees; most provide only best-effort provisioning, risking 
denial of service.This paper presents Xen, an x86 virtual machine monit ... 

Keywords: hypervisors, paravirtualization, virtual machine monitors 
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Edouard Bugnion, Scott Devine, Kinshuk Govil, Mendel Rosenblum 

November 1997 ACM Transactions on Computer Systems (TOCS), volume 15 Issue 4 

Full text available: sdff400.75 KB) Additional Information: full citation, abstract references , dflogs. index 



terms, review 



In this article we examine the problem of extending modern operating systems to run 
efficiently on large-scale shared-memory multiprocessors without a large implementation 
effort. Our approach brings back an idea popular in the 1970s: virtual machine monitors. 
We use virtual machines to run multiple commodity operating systems on a scalable 
multiprocessor. This solution addresses many of the challenges facing the system software 
for these machines. We demonstrate our approach with a prototy ... 

Keywords: scalable multiprocessors, virtual machines 
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Edouard Bugnion, Scott Devine, Mendel Rosenblum 

October 1997 ACM SIGOPS Operating Systems Review , Proceedings of the sixteenth 

ACM symposium on Operating systems principles, Volume 3i issue 5 
Full text available: ^pdff2.30 MB) Additional Information: full citation, references, citings, index terms 



Shade: a fast instruction-set simulator for execution profiling | 
Bob Cmelik, David Keppel 

May 1994 ACM SIG METRICS Performance Evaluation Review , Proceedings of the 1994 
ACM SIG METRICS conference on Measurement and modeling of computer 

Systems, Volume 22 Issue 1 
Full text available- MB? Additional Information: M-SMion, abstract, references, citings, index 

terms 

Tracing tools are used widely to help analyze, design, and tune both hardware and software 
systems. This paper describes a tool called Shade which combines efficient instruction-set 
simulation with a flexible, extensible trace generation capability. Efficiency is achieved by 
dynamically compiling and caching code to simulate and trace the application program. The 
user may control the extent of tracing in a variety of ways; arbitrarily detailed application 
state information may be collected ... 

Compjiatjm^ | 
Giuseppe Desoli, Nikolay Mateev, Evelyn Duesterwald, Paolo Faraboschi, Joseph A. Fisher 
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November 2002 Proceedings of the 35th annual ACM/IEEE international symposium on 
Microarchitecture 



Full text available: ^g p df ^ 2 j MB)ffl l Additional Information: full citation, abstract, references, citings, index 
PubjjsherSite terms 

The Dynamic Execution Layer Interface (DELI) offers the following unique capability: it 
provides fine-grain control over the execution of programs, by allowing its clients to 
observe and optionally manipulate every single instruction— at run time— just before it 
runs. DELI accomplishes this by opening up an interface to the layer between the execution 
of software and hardware. To avoid the slowdown, DELI caches a private copy of the 
executed code and always runs out of its own private cache. In ... 



8 Binary translation and architecture convergence issues for IBM system/390 
Michael Gschwind, Kemal Ebcioglu, Erik Altman, Sumedh Sathaye 
May 2000 Proceedings of the 14th international conference on Supercomputing 

Full text available: ^.pdf{144.MBj Additional Information: fuH cjtatipn, abstract, references, index terms 

We describe the design issues in an implementation of the ESA/390 architecture based on 
binary translation to a very long instruction word (VLIW) processor. During binary 
translation, complex ESA/390 instructions are decomposed into instruction "primitives" 
which are then scheduled onto a wide-issue machine. The aim is to achieve high instruction 
level parallelism due to the increased scheduling and optimization opportunities which can 
be exploited by binary translation software ... 




9 Virtua! machines: Scaie and performance in the Denali isolation kerne! 
Andrew Whitaker, Marianne Shaw, Steven D. Gribble 

December 2002 ACM SIGOPS Operating Systems Review, Volume 36 Issue SI 
Full text available: ^pd£191.MB.l Additional Information: Ml..Qlatign t abstract., references, citings 

This paper describes the Denali isolation kernel, an operating system architecture that 
safely multiplexes a large number of untrusted Internet services on shared hardware. 
Denali's goal is to allow new Internet services to be "pushed" into third party infrastructure, 
relieving Internet service authors from the burden of acquiring and maintaining physical 
infrastructure. Our isolation kernel exposes a virtual machine abstraction, but unlike 
conventional virtual machine monitors, Denali does not ... 

10 SM.limers:.effjcient.m 
Mohit Aron, Peter Druschel 

August 2000 ACM Transactions on Computer Systems (TOCS), volume is issue 3 

Full text available: <B )pdff272 44 KB) Additional Information: full citation , abstract , references , citings, index 
^ " * terms, review 

This paper proposes and evaluates soft timers, a new operating system facility that allows 
the efficient scheduling of software events at agranularity down to tens of microseconds. 
Soft timers can be used to avoid interrupts and reduce context switches associated with 
network processing, without sacrificing low communication delays. More specifically, soft 
timers enable transport protocols like TCP to efficiently perform rate-based clocking of 
packet transmissions. Experiments indicate that ... 
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review 

Parallel workstations, each comprising tens of processors based on shared memory, 
promise cost-effective scalable multiprocessing. This article explores the coupling of such 
small- to medium-scale shared-memory multiprocessors through software over a local area 
network to synthesize larger shared-memory systems. We call these systems Distributed 
Shared-memory Multiprocessors (DSMPs). This article introduces the design of a shared- 
memory system that uses multiple granularities of sharing, ca ... 

Keywords: distributed memory, symmetric multiprocessors, system of systems 



Full text available: f§pdf(369.18 KB) 



12 The K2 paraiiel processor: architecture and hardware implementation 
Marco Annaratone, Marco Fillo, Kiyoshi Nakabayashi, Marc Viredaz 

May 1990 ACM SIGARCH Computer Architecture News , Proceedings of the 17th 

annual international symposium on Computer Architecture, volume is issue 3 
Full text available: ^.pdgi.44.MBj. Additional Information: fuj[citatipn 4 abstract, references, index terms 

K2 is a distributed-memory parallel processor designed to support a multi-user, multi- 
tasking, time-sharing operating system and an automatically parallelizing FORTRAN 
compiler. This paper presents the architecture and the hardware implementation of K2, and 
focuses on the architectural features required by the operating system and the compiler. A 
prototype machine with 24 processors is currently being developed. 

13 Exokernel: an operating system architecture for application-level resource 
management 

D. R. Engler, M. F. Kaashoek, J. OToole 

December 1995 ACM SIGOPS Operating Systems Review , Proceedings of the fifteenth 

ACM symposium on Operating systems principles, volume 29 issue 5 
Full text available: fll pdf(2.i5 MB) Additional Information: full citation, references, citings, index terms 



14 Using the. SjmOS„machine.sim 

Mendel Rosenblum, Edouard Bugnion, Scott Devine, Stephen A. Herrod 

January 1997 ACM Transactions on Modeling and Computer Simulation (TOMACS), 

Volume 7 Issue 1 

Full text available: W.pdg.73176.KBj Additional Information: MLcitation, references, dtjncjs, index terms, review 



Keywords: computer architecture, computer simulation, computer system performance 
analysis, operating systems 



improving the reliability of commodity operating systems 
Michael M. Swift, Brian N. Bershad, Henry M. Levy 

January 2005 ACM Transactions on Computer Systems (TOCS), Volume 23 Issue 1 
Full text available: ^.pdf(459,98 KB) Additional Information: Mentation, abstract, references, index terms 

Despite decades of research in extensible operating system technology, extensions such as 
device drivers remain a significant cause of system failures. In Windows XP, for example, 
drivers account for 85&percent; of recently reported failures.This article describes Nooks, a 
reliability subsystem that seeks to greatly enhance operating system (OS) reliability by 
isolating the OS from driver failures. The Nooks approach is practical: rather than 
guaranteeing complete fault tolerance through ... 
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An implementation and analysis of the virtual interface architecture 
Philip Buonadonna, Andrew Geweke, David Culler 

November 1998 Proceedings of the 1998 ACM/IEEE conference on Supercomputing 
(CDROM) 

Full text available: fg| html(60.53 KB) Additional Information: full citation, abstract, references, citings 

Rapid developments in networking technology and a rise in clustered computing have 
driven research studies in high performance communication architectures. In an effort to 
standardize the work in this area, industry leaders have developed the Virtual Interface 
Architecture (VIA) specification. This architecture seeks to provide an operating system- 
independent infrastructure for high-performance user-level networking in a generic 
environment. This paper evaluates the inherent costs and performanc ... 

Keywords: cluster, interconnect, network, system-area, user-level, virtual interface 
architecture 




17 Soft timers: efficient microsecond software timer support for network processing 
Mohit Aron, Peter Druschel 

December 1999 ACM SIGOPS Operating Systems Review , Proceedings of the 

seventeenth ACM symposium on Operating systems principles, volume 33 

Issue 5 

Full text available: fBprifl1.65 MB) Additional Information: fall citation , aSstlBA references, cjtiQSs, index 
* terms 

This paper proposes and evaluates soft timers, a new operating system facility that allows 
the efficient scheduling of software events at a granularity down to tens of microseconds. 
Soft timers can be used to avoid interrupts and reduce context switches associated with 
network processing without sacrificing low communication delays. More specifically, soft 
timers enable transport protocols like TCP to efficiently perform rate-based clocking of 
packet transmissions. Experiments show that rate-base ... 

David Hovemeyer, Jeffrey K. Hollingsworth, Bobby Bhattacharjee 

March 2004 ACM SIGCSE Bulletin , Proceedings of the 35th SIGCSE technical 

symposium on Computer science education, volume 36 issue l 
Full text available: ^ |]pdf(l03.18 KB) Additional Information: full citation, abstract, references, index terms 

Undergraduate operating systems courses are generally taught using one of two 
approaches: abstractor concrete. In the abstract approach, students learn the concepts 
underlying operating systems theory, and perhaps apply them using user-level threads in a 
host operating system. In the concrete approach, students apply concepts by working on a 
real operating system kernel. In the purest manifestation of the concrete approach, 
students implement operating system projects that run on ... 

Keywords: education, emulation, hardware, operating systems 
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.terms. 

Recent advances in Field-Programmable Gate Arrays (FPGA) and programmable 
interconnects have made it possible to build efficient hardware emulation engines. In 
addition, improvements in Computer-Aided Design (CAD) tools, mainly in synthesis tools, 
greatly simplify the design of large circuits. The RPM (Rapid Prototype Engine for 
Multiprocessors) Project leverages these two technological advances. Its goal is to develop 
a common hardware platform forth ... 

Keywords: field-programmable gate arrays, logic emulation, message-passing 
multicomputers, rapid prototyping, shared-memory multiprocessors 
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